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Abstract 

Background: DNA methylation is one of the most phylogenetically widespread epigenetic modifications of 
genomic DNA. In particular, DNA methylation of transcription units ('gene bodies') is highly conserved across 
diverse taxa. However, the functional role of gene body methylation is not yet fully understood. A long-standing 
hypothesis posits that gene body methylation reduces transcriptional noise associated with spurious transcription of 
genes. Despite the plausibility of this hypothesis, an explicit test of this hypothesis has not been performed until now. 

Results: Using nucleotide-resolution data on genomic DNA methylation and abundant microarray data, here we 
investigate the relationship between DNA methylation and transcriptional noise. Transcriptional noise measured from 
microarrays scales down with expression abundance, confirming findings from single-cell studies. We show that gene 
body methylation is significantly negatively associated with transcriptional noise when examined in the context of 
other biological factors. 

Conclusions: This finding supports the hypothesis that gene body methylation suppresses transcriptional noise. Heavy 
methylation of vertebrate genomes may have evolved as a global regulatory mechanism to control for transcriptional 
noise. In contrast, promoter methylation exhibits positive correlations with the level of transcriptional noise. We 
hypothesize that methylated promoters tend to undergo more frequent transcriptional bursts than those that avoid 
DNA methylation. 
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Background 

DNA methylation at CpG dinucleotides is a key epigen- 
etic modification in the human genome crucial for regu- 
latory and developmental processes [1,2]. The degree of 
DNA methylation in the human genome is extensive: 
most CpG dinucleotides are methylated in most tissues 
and developmental stages examined [3-6]. In particular, 
transcription units, or so-called 'gene bodies', are even 
more heavily methylated than the surrounding intergenic 
regions [6-9]. 

The functional consequences of promoter methylation 
on chromatin configuration and transcriptional regulation 
are extensively documented (see, for example, [10-12]). 
There is also considerable evidence suggesting that DNA 
methylation suppresses proliferation of transposable ele- 
ments (TEs) [13-15]. However, the role of gene body 
methylation remains largely unresolved. Recently, studies 
have begun to identify molecular consequences of gene 
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body methylation. For example, gene body methylation af- 
fects pol II occupancy and histone modifications [16]. Dif- 
ferential levels of DNA methylation between different 
exons have been linked to differential inclusion and exclu- 
sion of specific exons in transcripts [17,18]. Gene body 
methylation may also occur as a byproduct of transcrip- 
tional processes [19]. Another possibility is that gene body 
methylation is simply an extension of methylation of TEs; 
many genes harbor TEs within their transcription units, 
and the main role of methylation is to suppress the prolif- 
eration of these TEs [15]. 

Nevertheless, the main role of gene body DNA methyla- 
tion remains unresolved. In fact, it is considered as one of 
the most long-standing open questions regarding genomic 
DNA methylation [20-25]. This question is even more per- 
tinent in light of evolutionary patterns of DNA methyla- 
tion. Comparative DNA methylation studies indicate that 
gene body methylation is the most conserved, ancestral 
form of genomic DNA methylation [7,9,23,26]. Thus, elu- 
cidating the role of gene body DNA methylation may pro- 
vide significant insights into the evolutionary divergence 
of genomic DNA methylation across taxa [9,23,26,27]. 
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A long-standing hypothesis posits that gene body DNA 
methylation suppresses spurious transcription within cod- 
ing regions. By doing so, gene body methylation can ef- 
fectively reduce 'transcriptional noise' [27,28]. This 
hypothesis is based upon the well-accepted idea that DNA 
methylation is generally repressive [29]. Pervasive DNA 
methylation of gene bodies, and the consequent suppres- 
sion of transcriptional noise, may have served as a key fa- 
cilitator enabling the evolution of complex vertebrate 
genomes [27]. Moreover, recent studies have begun to in- 
dicate that epigenetic mechanisms are deeply implicated 
in regulation of gene expression variability [30-33]. 

However, a detailed analysis of the relationship between 
transcriptional noise and DNA methylation has been lack- 
ing until now, due in large part to technical difficulties. 
Here, capitalizing on the recent progress in genomics and 
epigenomics, we investigated the impact of DNA methyla- 
tion on transcriptional noise, using data from the human 
genome. Our analyses provide, for the first time, un- 
equivocal evidence supporting the role of gene body 
methylation to reducing transcriptional noise. Further- 
more, we show that promoter DNA methylation is also 
highly significantly associated with transcriptional noise. 

Results 

Transcriptional noise is negatively correlated with 
expression abundance and associate with specific functions 

Levels of gene expression vary between cells even with the 
same genetic materials and under the same biological con- 
ditions [34-36]. Understanding the nature and mechanism 
of such variability, which is commonly referred to as 'tran- 
scriptional noise! has manifold functional consequences 
[37]. Recently, there have been significant improvements 
in experimental methods to measure transcriptional noise, 
as well as in the theoretical understanding of transcrip- 
tional noise. These studies indicate that transcriptional 
noise may occur due to transcriptional bursting of pro- 
moters, as well as spurious transcription within coding se- 
quences [38-41]. 

Transcriptional noise in multicellular organisms, such 
as mammals, cannot be easily dissected using experimen- 
tal means. However, they can be approximated using 
abundant expression datasets, for example utilizing nor- 
malized variation among microarray assays between repli- 
cates of populations [42,43]. For example, Yin et al. [42] 
compared the transcriptional noise measured from 
microarrays to those measured from single-cell experi- 
ments. The two results correspond remarkably well [42]. 
Similar results were seen in another study, comparing ex- 
pression variation among populations to experimentally 
measured transcriptional noise [43]. Following these ap- 
proaches, in this study we approximated transcriptional 
noise of human genes as the coefficient of variation of 
transcriptional abundance, assayed between replicates of 



populations of the same tissue samples under normal con- 
ditions (see Methods). 

There have been significant recent technical improve- 
ments in analysis of genomic DNA methylation. In par- 
ticular, researchers have begun to generate whole-genome 
maps of DNA methylation at the nucleotide level, via 
whole-genome sequencing of bisulfite-converted genomic 
DNA [5,44,45]. This method quantifies the methylation 
level of each CpG dinucleotide across the whole genome, 
enabling us to discern gene body methylation levels for in- 
dividual genes. 

In this study, we analyzed DNA methylation and tran- 
scriptional noise of the prefrontal cortex (brain) and the 
peripheral blood mononuclear cells (blood). We chose 
these two tissues for the following reasons. First, we de- 
cided to analyze 'normal' tissues (as opposed to cell 
lines). While there exists vast information on transcrip- 
tional variation of cell lines, gene expression profiles of 
cell lines are known to have significantly diverged from 
those of normal tissues [46]. Consequently, we chose not 
to consider cell lines in the current study. Second, we 
chose tissues whose genome-wide methylation maps are 
currently available. Finally, large numbers of microarray 
data in the control' (as opposed to disease) conditions 
exist for these tissues, thereby enabling us to measure 
transcriptional noise with confidence. We used rigorous 
quality control processes to curate microarray data from 
these tissues (see Methods). The resulting data are from 
the same technical platforms, and exhibit high correlation 
levels among experiments (Additional files 1 and 2). 

We examined whether the transcriptional noise calcu- 
lated from these curated data exhibited similar proper- 
ties to those identified from previous studies. For 
example, from studies of yeast, genes involved in protein 
synthesis exhibited lower noise compared to other genes 
[39,40]. At the same time, genes responding to environ- 
mental signals or stress genes showed particularly high 
levels of noise [39,40]. We found similar patterns in the 
transcriptional noise of human genes (Additional file 3). 
One of the most striking findings from previous studies 
was that transcriptional noise is approximately propor- 
tional to the expression abundance [39,40]. We observed 
the same scaling behavior in which the transcriptional 
noise was negatively associated with expression abun- 
dance in both human tissues studied (Figure 1). This ob- 
servation indicates that the scaling of transcriptional 
noise to expression abundance is likely to be a common 
phenomenon across diverse taxa, underscoring common 
molecular mechanisms, such as random birth and death 
processes of mRNAs [39,40,47]. It has been also pro- 
posed that transcriptional noise is minimized for essen- 
tial genes [48]. However, in our data, we did not 
observe enrichment of low-noise genes in essential genes 
(Additional file 4). 
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Figure 1 Transcriptional noise and expression abundance are significantly negatively correlated in (A) brain, and (B) blood. 

Transcriptional noise is measured as the coefficient of variation of transcriptional abundance (see Methods section). The regression coefficients 
between these variables are -0.60 {P <0.001) and -0.55 {P <0.001) for brain and blood, respectively. 



Gene body methylation and promoter methylation 
exhibit negative and positive associations with 
transcriptional noise 

Our interest was in determining whether DNA methyla- 
tion influences transcriptional noise. To do so, we needed 
to first account for the effect of expression abundance on 
both of these variables. This is because DNA methylation 
is intimately related to expression abundance [6,10,23,25], 
and gene expression abundance is correlated with tran- 
scriptional noise (Figure 1). In addition, other genomic 
variables, such as gene lengths, are also correlated with ex- 
pression abundance [49,50]. 

Our goal was to explain the variation found in the levels 
of transcriptional noise using several explanatory (inde- 
pendent) variables. We used the following variables as ex- 
planatory variables: expression abundance, gene body 
methylation, promoter methylation, and gene lengths. We 
first examined the variance inflation factors (VIFs), which 
are indicators of multicolinearity among variables. None 
of the explanatory variables exhibited VIFs greater than 5. 
This demonstrated that we could assess individual contri- 
butions of each genomic trait without the influence of 
multicolinearity [51]. 

We found that, in both tissues, gene body methylation 
shows significant negative relations to transcriptional 
noise (Table 1). This is in accord with the hypothesis 
that gene body DNA methylation suppresses transcrip- 
tional noise [27]. As gene length increases, there may be 
more opportunities for spurious transcription. In other 
words, gene length may be positively correlated with 
transcriptional noise. According to our multiple linear 
regression analysis, however, the effect of gene length on 



transcriptional noise, while controlling for other factors, 
was negligible in the brain data, but significantly negative 
in the blood data (Table 1). Analyzing more tissue samples 
would clarify the effect of gene length on transcriptional 
noise. Interestingly, promoter methylation again exhibited 
strong positive relations with the transcriptional noise in a 
multiple linear regression setting (Table 1). 

In the above analyses, we analyzed gene body methyla- 
tion levels after removing TEs. We also sought to 

Table 1 Multiple linear regression models explaining 
variation of transcriptional noise in different tissues 

Predictors Estimate of (3 t value Significance VI F 

Brain 



Intercept 


1.47 


19.51 


<10" 4 


Expression abundance 


-0.59 


-180.50 


<10" 4 


Gene body methylation 3 


-0.28 


-4.74 


<10" 4 


Promoter methylation 


0.20 


4.94 


<10" 4 


Log (gene length) 3 


0.00092 


0.099 


0.921 


Adjusted R 2 








Blood 








Intercept 


1.89 


28.92 


<10" 4 


Expression abundance 


-0.55 


-237.24 


<10" 4 


Gene body methylation 3 


-0.37 


-6.68 


<10" 4 


Promoter methylation 


0.29 


7.36 


<10" 4 


Log (gene length) 1 


-0.038 


-5.09 


<10" 4 



Adjusted R 2 0.92 

Exclusive of transposable elements. 
VIF, variance inflation factor. 
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include methylation of TEs specifically in our model, 
using the following method. We first estimated methyla- 
tion levels of gene bodies that are identified as TEs 
according to RepeatMasker [52]. Then we included this 
methylation level of TEs found within each gene as a 
separate variable in a multiple linear regression setting. 
The length of TEs themselves within each gene could 
not be included in this model because they exhibited 
high VIFs (7.39 in brain, 6.62 in blood, respectively), and 
thus could cause multicolinearity problems. The results 
of this analysis, presented in Table 2, demonstrate that 
TE methylation is significantly negatively correlated with 
transcriptional noise. In other words, TE methylation 
may also contribute to reducing transcriptional noise. 
The regression coefficients of other variables are highly 
similar to those from Table 1, indicating that the effects 
of other variables are not highly influenced by the level 
of TE methylation. 

To attest that our results were not biased due to statis- 
tical outliers, we next performed robust regression ana- 
lyses using the same explanatory variables. We used 
several available methods including quantile regression 
as well as a few well-known loss functions such as 
bisquare, and Hampel and Huber ([53-55]; see also 
Methods section). The results of these analyses (Table 3 
and Additional file 5) were unanimously consistent with 
the previous results, indicating highly significant nega- 
tive associations between the level of gene body DNA 
methylation and transcriptional noise, and highly 

Table 2 Multiple linear regression models explaining 
variation of transcriptional noise in different tissues 

Predictors Estimate of (3 t value Significance VIF 



1.12 
2.34 
1.44 
1.28 
2.10 
0.87 



Expression abundance -0.55 -236.94 <0.0001 1.11 

Gene body methylation 1 -0.28 -4.95 <0.0001 1.93 

TE methylation -0.22 -5.77 <0.0001 1.43 

Promoter methylation 0.27 6.88 <0.0001 1.28 

Log (gene length) 3 -0.025 -3.19 0.0014 1.81 

Adjusted R 2 0.92 

Exclusive of transposable elements. 

TE, transposable element; VIF, variance inflation factor. 



Table 3 Robust regression analyses (quantile regression 
for median) for the model used in Table 1 

Predictors Estimate of (3 t value Significance 

Brain 



Intercept 


1.53 


19.51 


<0.0001 


Expression abundance 


-0.61 


-188.65 


<0.0001 


Gene body methylation 3 


-0.26 


-4.30 


<0.0001 


Promoter methylation 


0.13 


3.76 


0.0002 


Log (gene length) 3 


0.0008 


0.0885 


0.3762 


Blood 








Intercept 


1.82 


28.75 


<0.0001 


Expression abundance 


-0.55 


-237.24 


<0.0001 


Gene body methylation 3 


-0.28 


-4.65 


<0.0001 


Promoter methylation 


0.20 


5.38 


<0.0001 


Log (gene length) 3 


-0.03 


-5.09 


<0.0001 



'Exclusive of transposable elements. 



significant positive associations between promoter DNA 
methylation and transcriptional noise. In conclusion, 
these analyses reveal that after controlling for other fac- 
tors, gene body methylation and promoter methylation 
are negatively and positively correlated with transcrip- 
tional noise, respectively. 

Accounting for technical noise and among individual 
variability of DNA methylation 

One potential caveat of our approach is the presence of 
technical noise, or variation of gene expression caused 
by technical variation among experiments, on the level 
of gene expression variability. Our interest is in the bio- 
logical variability of gene expression. As defined previ- 
ously, we approximated 'transcriptional noise' as the 
coefficient of variation (CV) among the replicates of ex- 
pression data, as used previously [42]. However, this 
measure of gene expression variability is a composite of 
biological noise, which is our main interest, plus tech- 
nical variation among experiments. This is problematic 
because it is possible that technical noise might be con- 
founded with biological noise. For example, technical 
variation among experiments is negatively correlated 
with the expression level of genes [56,57]. Thus, it is im- 
portant to take into account the impact of technical 
noise in assessing the relationship between biological 
noise and DNA methylation. 

To address this issue, we used a dataset on technical 
and biological replicates of blood gene expression. In 
this dataset, gene expression is measured in two sets of 
technical replicates across two biological experiments 
[58]. Using this data, we can decompose total variation 
of gene expression into 'biological' versus 'technical' vari- 
ation. Specifically, for a specific gene using y t j as the 



Brain 



Intercept 


1.47 


19.57 


<0.0001 


Expression abundance 


-0.59 


-180.78 


<0.0001 


Gene body methylation 1 


-0.19 


-3.16 


0.0016 


TE methylation 


-0.23 


-5.78 


<0.0001 


Promoter methylation 


0.18 


4.54 


<0.0001 


Log (gene length) 3 


0.015 


1.54 


0.12 


Adjusted R 2 








Blood 








Intercept 


1.87 


28.65 


<0.0001 


Expression abundance 


-0.55 


-236.94 


<0.0001 


Gene body methylation 1 


-0.28 


-4.95 


<0.0001 


TE methylation 


-0.22 


-5.77 


<0.0001 


Promoter methylation 


0.27 


6.88 


<0.0001 


Log (gene length) 3 


-0.025 


-3.19 


0.0014 


Adjusted R 2 
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expression level of the ;th technical replicate from the 
ith biological sample, decomposition of variance can be 
expressed as in Equation 1 below: 

EEM) 2 =2E&ry) 2 +EEM) 2 

i=l ;'=1 i=l z'=l /=1 

(i) 

The left term represents the total sum of square in a 
gene; the first term on the right-hand side is the bio- 
logical sum of squares and the second term is the tech- 
nical sum of squares. Using this decomposition, we can 
then assess the effect of gene body methylation on the 
pure biological variation and on the technical variation, 
separately. In our first analysis, we used the biological 
sum of squares as the response variable, and examined 
the statistical effects of several predictor variables. The 
results of this analysis showed that gene body methyla- 
tion has a significant negative effect on biological vari- 
ability among samples (model 1 in Table 4). In the 
second analysis, we used the total sum of square as the 
response variable and the technical sum of square as an 
explanatory variable. The results from this analysis again 
indicated that the effect of gene body methylation on 
'biological' transcriptional noise, after adjusting for the 
technical noise, is negative (model 2 in Table 4). Thus, 
both methods provide consistent support to our finding 
that gene body methylation is negatively correlated with 
biological variation of gene expression. 

Table 4 Multiple linear regression models in which 
technical versus biological components of transcriptional 



noise are separately analyzed 


Predictors 


Estimate of p 


t value 


Significance 


VIF 


Model 1 a 










Intercept 


1.201 


14.12 


<10" 4 




Expression 


-0.442 


-78.19 


<10" 4 


1.06 


Gene body methylation 


-0.797 


-7.33 


<10" 4 


1.07 


Promoter methylation 


0.613 


6.17 


<10" 4 


1.06 


Adjusted R 2 








0.53 


Model 2 b 










Intercept 


0.769 


11.157 


<10" 4 




Expression 


-0.337 


-61.354 


<10" 4 


3.39 


Gene body methylation 


-0.566 


-9.463 


<10" 4 


1.10 


Promoter methylation 


0.431 


7.969 


<10" 4 


1.07 


Technical noise 


0.608 


32.467 


<10" 4 


3.30 


Adjusted R 2 








0.82 



a Model 1 used CV calculated from biological component as response variable. 
b Model 2 used CV calculated from total variation as response variable. 
CV, coefficient of variation; VIF, variance inflation factor. 



Another source of variability that needs to be accounted 
for is variation of DNA methylation between individuals. 
To determine the influence of between-individual variabil- 
ity of DNA methylation on our results, we analyzed 
datasets on gene body DNA methylation from the brains 
of three individuals [6]. We constructed an augmented re- 
gression model, allowing the effect of gene body methyla- 
tion to vary across individuals. We defined an index for 
each individual as an 'individual factor' and included it in 
the new model. In addition, we included interaction terms 
between individual factors and gene body methylation to 
this model. The results of these analyses (Table 5, and 
Additional file 6) indicate that between-individual varia- 
tions of gene body methylation do not affect our findings. 

Discussion 

The human genome and other vertebrate genomes are 
heavily methylated in most tissues and developmental 
stages, a pattern referred to as 'global' DNA methylation 
[23]. This pattern is very different from what is observed 
in other animals and plants. In most invertebrates exam- 
ined, DNA methylation is targeted to the transcription 
units (gene bodies) of a subset of genes [7,9,23]. Notably, 
gene body methylation appears to have existed well be- 
fore the emergence of DNA methylation of promoters 
and TEs, as an ancestral form of DNA methylation in di- 
verse animal and plant genomes [23,60,61]. 

Determining the role of gene body methylation is of 
much interest, and studies are revealing associations be- 
tween gene body methylation and gene expression 
[9,21,62,63], transcript composition [17,18,64] and chro- 
matin structures [16]. Nonetheless, the global role of 
gene body methylation remains unresolved. In this re- 
spect, two long-standing hypotheses stand out. The first 
hypothesis posits that gene body methylation reduces 
transcriptional noise [27]. Another hypothesis focuses 
on the impact of DNA methylation to suppress the pro- 
liferation of TEs [15]. Many TEs are found in gene bod- 
ies, thus methylation of TEs may have caused expansive 
methylation of gene bodies [15]. 

In this study we examined the predictions of these two 
hypotheses using whole genome methylation data and 
statistical methods. Because gene body methylation and 
transcriptional noise are both significantly correlated 
with expression abundance, it is important to analyze 
the impact of gene body methylation while considering 
the effect of expression abundance. We used several 
statistical methods to achieve this goal. We also exam- 
ined the impact of noise due to technical variation 
among experiments, as well as between- individual vari- 
ation of DNA methylation on our results. These analyses 
all indicate that gene body methylation, when viewed in 
the context of other biological factors, has a negative re- 
lationship with transcriptional noise. 
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Table 5 Regression analysis accounting for individual variation indicates little effect of between-individual variability 
of DNA methylation on transcriptional noise 



Predictors 


Sum of square 


Degrees of freedom (df) 


F value 


P value 


VI F a 


Intercept 


214.6 




578.390 


<10" 4 


1.03 


Expression 


28,164.2 




75,900.35 


<10" 4 


1.30 


Gene length 


7.6 




20.352 


<10" 4 


2.00 


Gene body methylation 


17.3 




46.541 


<10" 4 


1.04 


Promoter methylation 


43.2 




116.420 


<10" 4 


3.89 


Individual 


0.3 


2 


0.430 


0.651 


4.05 


Individuahgene body methylation 


0.3 


2 


0.455 


0.634 




Adjusted R 2 








0.87 





Variation inflation factor (VIF) approximated as (generalized VIF) 1/(2 * df) [59]. 



Transcriptional noise is abundantly present in diverse 
taxa. The origin of transcriptional noise may be related 
to 'transcriptional bursts', referring to the phenomenon 
that transcription tends to occur in bursts [65-67]. Tran- 
scriptional noise also occurs due to transcription of non- 
canonical promoters within gene bodies, potentially due 
to the overabundance of RNA polymerase II in cellular 
environment [38]. Our results showing that more heavily 
methylated gene bodies exhibit less transcriptional noise 
are consistent with the idea that transcriptional noise is 
reduced by pervasive gene body methylation. Alterna- 
tively, the negative relationship between gene body DNA 
methylation and transcriptional noise may reflect an in- 
direct association due to a third, yet unknown biological 
factor (s) that influence both variables. 

The details of the actual underlying molecular mecha- 
nisms of such process are yet to be fully characterized. 
There are some well established epigenetic modifications 
of gene bodies are shown to directly suppress the initi- 
ation of non-canonical transcripts within coding se- 
quences [68-70]. Emerging evidence indicate that gene 
body DNA methylation is likely to complement or func- 
tion together with other epigenetic modifications to gen- 
erate chromatin states that are repressive of the initiation 
and elongation of spurious transcripts. For example, DNA 
methylation of gene bodies reduces the efficiency of tran- 
scriptional elongation, by excluding RNA polymerase II 
occupancy and recruiting several repressive histone marks 
[16]. Gene body DNA methylation effectively excludes de- 
position of the histone variant H2A.Z, which tend to mark 
lowly expressed genes with high expression variability 
among tissues and biological conditions [71]. The iden- 
tities of molecular components of the crosstalk between 
DNA methylation and histone modifications continue to 
be discovered (see, for example, [72]). 

Interestingly, our analyses indicate that promoter DNA 
methylation is positively correlated with the level of tran- 
scriptional noise. The underlying molecular mechanism of 
this phenomenon is of great interest. One possibility is 



that this is related to the intrinsic susceptibility of specific 
promoters toward transcriptional bursting. In the simplest 
case, promoters appear to switch randomly between 'ON' 
and 'OFF' states with respect to the initiation of transcrip- 
tion [37,47]. Some promoters, however, remain perpetu- 
ally in the 'ON' state (permissive to transcription) and do 
not exhibit bursting [73]. Such promoters exhibit less 
transcriptional variability compared to those undergoing 
switches between different transcriptional states [73]. In 
other words, the degree of transcriptional bursting likely 
varies between promoters according to their propensity 
toward different transcriptional states, leading to different 
levels of transcriptional noise among genes. 

Given that there exists considerable evidence that un- 
methylated promoters can maintain a permissive' chroma- 
tin state [72,74], we hypothesize the following: promoters 
with lower level of DNA methylation are more likely to 
adopt and maintain a permissive transcriptional state 
(similar to the 'ON' state referred to above) and exhibit lit- 
tle transcriptional bursting. However, promoters that are 
more susceptible to DNA methylation may be more likely 
to undergo stochastic fluctuations between different 
states, facilitating transcriptional bursts, and as a conse- 
quence exhibit increased transcriptional noise. The actual 
molecular mechanisms underlying these processes are 
again likely to involve highly orchestrated interactions be- 
tween DNA methylation and other epigenetic mecha- 
nisms: in particular, studies in yeast have revealed the role 
of nucleosome positioning in regulation of gene expres- 
sion variability [31,33]. 

Reducing transcriptional noise is particularly import- 
ant for genes that perform housekeeping functions and 
are therefore constantly expressed [28]. Indeed, methyla- 
tion maps of distantly related animal genomes reveal 
that gene body methylation usually targets genes that 
function in 'housekeeping' cellular processes [26,28,75]. 
Thus, we hypothesize that gene body methylation func- 
tions as a primary mechanism to suppress transcriptional 
noise of essential housekeeping genes in diverse 
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organisms. Gene body DNA methylation is the main 
mode of DNA methylation in many invertebrate species. 
Reducing transcriptional noise may serve as the primary 
function of DNA methylation in such genomes. Further- 
more, the human genome is characterized by heavy 
transcription of non-coding regions [76,77]. Global 
methylation of the whole genome may have evolved as a 
molecular mechanism to reduce global transcriptional 
noise [27]. 

Moreover, we found that methylation of TEs within 
gene bodies also contributes to the suppression of tran- 
scriptional noise. Several studies now indicate that 
methylation of TEs may have evolved after the evolution 
of gene body methylation [23,61]. It will be interesting 
to determine whether the origin of TE methylation is re- 
lated to its function to reduce intragenic transcriptional 
noise. Our study cannot provide a clear resolution to 
this question. Analyses of genomic methylation patterns 
of species straddling the invertebrate -vertebrate bound- 
aries (near the origin of global DNA methylation) will be 
informative to determine the evolutionary sequences of 
these processes. 

DNA methylation is known to vary among different 
tissues [5,6]. Given the potential role of gene body 
methylation in regulating transcriptional noise, it is pos- 
sible that among-tissue variation of DNA methylation 
levels may be related to among-tissue variation of tran- 
scriptional noise. In our data, the prefrontal cortex 
(brain) exhibited higher methylation levels than blood 
(P <0.0001 by Mann- Whitney test, Figure 2). Since gene 
body DNA methylation is negatively correlated with tran- 
scriptional noise, we tested whether the brain exhibits 
lower noise compared to blood. Indeed, we found that 
prefrontal cortex samples (brain) exhibited significantly 
lower transcriptional noise compared to blood samples 
(Figure 2). Thus, regulation of transcriptional noise may 



be one mechanism determining tissue-specific or cell 
type-specific levels of gene body DNA methylation. 

Conclusion 

We explored the relationship between transcriptional 
noise and DNA methylation, using gene expression vari- 
ability among different populations of cells as a proxy 
for transcriptional noise. Our analysis confirms the in- 
verse relationship between gene expression abundance 
and transcriptional noise, while revealing novel relation- 
ships between DNA methylation and transcriptional 
noise. In particular gene body DNA methylation exhibits 
a negative correlation with transcriptional noise. This 
observation supports a longstanding hypothesis that 
gene body DNA methylation may reduce transcriptional 
noise. In light of evolutionary findings that gene body 
methylation is a widespread, conserved form of DNA 
methylation, the ancestral role of DNA methylation may 
have been related to the reduction of transcriptional 
noise. On the other hand, promoter DNA methylation is 
positively related to transcriptional noise, raising the 
possibility that epigenetic status of promoters may affect 
transcriptional bursts. 

Methods 

Data sources 

Gene expression data was obtained from National Center 
for Biotechnology Information (NCBI) Gene Expression 
Omnibus (http://www.ncbi.nlm.nih.gov/geo/) (Additional 
file 7). Because there are considerable technical variations 
between platforms, we restricted platforms to only the 
Asymetrix Human Genome U133 series. After quality 
control, we obtained a total of 52 datasets (12 datasets for 
the prefrontal cortex and 40 datasets for blood). Gene 
lengths were determined based upon the RefSeq annota- 
tion provided by the UCSC genome browser. Nucleotide 
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Figure 2 Comparison of gene expression noise and DNA methylation between studied tissues (A) Comparison of mean transcriptional 
noise between the two tissues. The brain exhibits significantly lower transcriptional noise compared to blood (paired t test, P <0.001). (B) Methylation 
levels, however, are significantly higher in the brain compared to blood (paired t test, P <0.001 ). We only used genes for which methylation and transcriptional 
noise data exist in both tissues (total no. of genes = 3,644). 
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resolution whole DNA methylation maps of the human 
prefrontal cortex (brain) were obtained from a recent 
study ([6], data available at NCBI Gene Omnibus under 
the record number GSE37202). DNA methylation maps of 
mature peripheral blood mononuclear cells were from Li 
et al [44], generated using a similar method. 

DNA methylation 

To obtain gene body methylation levels of non-repetitive 
portions of genes, we used the annotation of TEs from 
the RepeatMasker database (http://www.repeatmasker. 
org). A custom Perl script was used to mask the TEs in 
gene bodies. For each mapped cytosine, the fractional 
methylation value was calculated as: total number of 'C 
reads/ (total number of 'C reads + total number of T 
reads), following previous studies [5,8,44]. We then cal- 
culated the fractional methylation level of each tran- 
scription unit, using the RefSeq database of hgl8. Gene 
body methylation level for each gene was estimated as 
the mean fractional methylation value for all the mapped 
cytosines within each transcription unit. When alterna- 
tive transcripts were present, we chose the longest tran- 
script for each gene. The promoter methylation level for 
each gene was estimated as fractional methylation for re- 
gions spanning 1,500 bp upstream and 500 bp down- 
stream of the transcription start site (TSS), similar to 
Zeng et al [6]. 

Microarray data processing 

Microarray raw data files were first processed using raw 
intensity using the MAS5.0 method [78]. Using other 
normalization methods provided similar results. We 
used the median probe intensities assigned to each gene 
as gene expression levels. We then analyzed correlation 
between pairwise samples, to assess similarities between 
datasets from the same tissue. Datasets within the same 
tissues exhibiting correlation coefficient greater than 0.8 
are included in this study (Additional files 1 and 2). Quan- 
tile normalization using the R package preprocesscore 
[79] was conducted within each tissue. Transcriptional 
noise was defined as the coefficient of variation (CV: 
standard deviation/mean) of transcriptional abundance 
within each tissue, following Yin et al [42] . 

Multiple linear regression models of transcriptional noise 

We performed multiple linear regression analyses to elu- 
cidate relationships between transcriptional noise and 
several biological factors (gene expression abundance, 
gene body methylation, promoter methylation, and gene 
lengths) simultaneously. CV and gene length were log 
transformed to improve normality. Our analyses indi- 
cated that the gender is not a significant variable and 
thus excluded from further analyses. We also examined 
the significance of the interaction terms between 



predictors. The results showed that the interaction terms 
were generally not significant and they were therefore 
removed from subsequent analyses. 

Robust regression analysis was performed using vari- 
ous loss functions. We summarized the result of quantile 
regression in Table 3. We also used other well-known 
loss functions such as bisquare, Hampel and Huber 
(Additional file 6) [53-55]. All these approaches provided 
consistent results to those of the ordinary least squares 
method. Therefore, we conclude that the significance 
and magnitude of the explanatory variable effect is 
essential. 

Functional enrichment analyses 

Functional enrichment pattern of specific subsets of 
genes was assessed using the DAVID tools (http://david. 
abcc.ncifcrf.gov/) [80]. We used the list of genes in- 
cluded in our analyses as the background, and tested en- 
richments of specific gene ontology terms using the GO 
FAT annotation. We examined the mean transcriptional 
noise of genes in the two tissues and investigated the 
specific gene ontology terms for top 5% high transcrip- 
tional noise genes and 5% low transcriptional noise 
genes. A Benjamini multiple testing correction of the 
EASE score (a modified Fisher exact P value) was used 
to determine statistical significance of gene enrichment. 

Additional files 



Additional file 1: Correlation between 12 brain microarray datasets 
used. 

Additional file 2: Correlation between blood microarray datasets 
used. For the interest of space, we only show 12 microarray datasets. 
The remaining data exhibit similarly high correspondence between 
datasets. 

Additional file 3: GO enrichment analyses of genes exhibiting high 
or low transcriptional noise. 

Additional file 4: No enrichment of low noise genes according to 
gene essentiality. 

Additional file 5: Robust regression analysis using transcriptional 
noise as a response variable and other biological variables as 
explanatory variables. 

Additional file 6: Multiple linear regression analyses incorporating 
individual factors indicate little effect of between-individual DNA 
methylation on transcriptional noise. 

Additional file 7: List of microarray datasets used in this study. 
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